Mining Textual Stream with Partial Labeled Instances Using Ensemble Framework
نویسندگان
چکیده
Increasing access to large-scale, high-dimensional and non-stationary streams in many real applications has made it necessary to design new dynamic classification algorithms. Most existing approaches for the textual stream classification are able to train the model relying on labeled data. However, only a limited number of instances can be labeled in a real streaming environment since large-scale data appear at a high speed. Therefore, it is useful to make unlabeled instances available for training and updating the ensemble models. In this paper, we present a new ensemble framework with partial labeled instances for learning from the textual stream. A new semi-supervised cluster-based classifier is proposed as the subclassifier in our approach. In order to integrate these sub-classifiers, we propose an adaptive selection method. Empirical evaluation of textual streams reveals that our approach outperforms state-of-the-art stream classification algorithms.
منابع مشابه
A Dynamic Ensemble Framework for Mining Textual Streams with Class Imbalance
Textual stream classification has become a realistic and challenging issue since large-scale, high-dimensional, and non-stationary streams with class imbalance have been widely used in various real-life applications. According to the characters of textual streams, it is technically difficult to deal with the classification of textual stream, especially in imbalanced environment. In this paper, ...
متن کاملDetecting Concept Drift in Data Stream Using Semi-Supervised Classification
Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...
متن کاملNaïve Bayes Classification Ensembles to Support Modeling Decisions in Data Stream Mining
Data stream mining is the process of applying data mining methods to a data stream in real-time in order to create descriptive or predictive models. Due to the dynamic nature of data streams, new classes may emerge as a data stream evolves, and the concept being modeled may change with time. This gives rise to the need to continuously make revisions to the predictive model. Revising the predict...
متن کاملA Semi-supervised Ensemble Approach for Mining Data Streams
There are many challenges in mining data streams, such as infinite length, evolving nature and lack of labeled instances. Accordingly, a semi-supervised ensemble approach for mining data streams is presented in this paper. Data streams are divided into data chunks to deal with the infinite length. An ensemble classification model E is trained with existing labeled data chunks and decision bound...
متن کاملA Data Intensive Multi-chunk Ensemble Technique to Classify Stream Data Using Map-Reduce Framework
We propose a data intensive and distributed multichunk ensemble classifier based data mining technique to classify data streams. In our approach, we combine r most recent consecutive data chunks with data chunks in the current ensemble and generate a new ensemble using this data for training. By introducing this multi-chunk ensemble technique in a Map-Reduce framework and considering the concep...
متن کامل